Added more plugins for search/fetch#7
Conversation
- Added `cheerio` for HTML parsing and cleanup - Integrated `turndown` for Markdown conversion with collapsed blank lines - Dropped irrelevant tags and attributes via whitelisting - Improved test coverage to validate functionality
Removed because it can hide Crawl4Ai failures and result in IP blocking
…4ai-fetch` plugin
…s usage in `builtin-parse-htmlToMd` plugin
…y function and updated tests
…ntribution in separate md files
…ch plugins to `builtin-searxng-search` and `builtin-crawl4ai-fetch`
…references to additional configuration options
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughStandardizes PluginContext to a keyed map, adds shared search/display utilities, implements HTML→Markdown parsing, adds/refactors multiple builtin search/fetch plugins with timeouts and parse delegation, updates tests, expands plugin registry/defaults, and adds configuration and contributor docs. ChangesPlugin System Architecture Expansion
Sequence Diagram(s) sequenceDiagram
participant CLI
participant PluginLoader
participant Plugin
participant ParsePlugin
CLI->>PluginLoader: buildPluginContext() (Partial<Record<PluginType, PluginTypeDeclaration>>)
CLI->>Plugin: invoke search/fetch with (input, context)
Plugin->>Plugin: perform HTTP POST/processing (with AbortSignal.timeout)
Plugin->>ParsePlugin: optional parsePlugin.fn(html, context)
Estimated code review effort 🎯 4 (Complex) | ⏱️ ~60 minutes Possibly Related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/plugin-loader.test.ts (1)
188-215:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRestore parse-plugin invocation assertion in loader tests
The parse custom-plugin case now checks metadata only; it no longer verifies
fn(html, context)execution, so this contract path can regress unnoticed.Proposed test fix
expect(plugins.slice(0, -1)).toEqual(builtinPlugins); expect(customPlugin?.name).toEqual("test-parse-plugin"); expect(customPlugin?.type).toEqual("parse"); + await expect(customPlugin.fn("<p>hi</p>", context)).resolves.toEqual("parsed <p>hi</p>"); expect(console.warn).not.toHaveBeenCalled(); expect(console.error).not.toHaveBeenCalled();🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/plugin-loader.test.ts` around lines 188 - 215, The test no longer verifies that a custom parse plugin's runtime function is invoked; update the test in src/plugin-loader.test.ts so after obtaining customPlugin (the ParsePlugin created by parseFn and exported as SilbylPlugin) you call its fn with a sample HTML string and a minimal context object and assert the returned value equals the expected "parsed <html>" result, while still asserting no console.warn/error calls; ensure you invoke customPlugin.fn(html, context) (or await it if async) and check the output to restore the parse-plugin invocation contract that loadPlugins() must satisfy.
🧹 Nitpick comments (2)
CLAUDE.md (1)
40-40: ⚡ Quick winStandardize documentation cross-references with consistent path format.
Both files reference user-facing documentation (CONFIGURATION.md, CONTRIBUTION.md, CREATING-PLUGINS.md) but with inconsistent formatting:
CLAUDE.md#L40listsdocs/CONFIGURATION.mdwith the path, but then omits the path for the other two docs.README.md#L89-97uses consistent full paths in the links (docs/CONFIGURATION.md,docs/CREATING-PLUGINS.md,docs/CONTRIBUTION.md).To maintain clarity and consistency, update CLAUDE.md line 40 to use full paths for all three documentation files.
✏️ Proposed fix for CLAUDE.md
-User-facing docs live in `docs/` — `CONFIGURATION.md` (config + per-plugin env-var tables), `CREATING-PLUGINS.md`, and `CONTRIBUTION.md` (linked from `README.md`). +User-facing docs live in `docs/` — `CONFIGURATION.md` (config + per-plugin env-var tables), `docs/CREATING-PLUGINS.md`, and `docs/CONTRIBUTION.md` (linked from `README.md`).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@CLAUDE.md` at line 40, Update the reference on CLAUDE.md (currently line with "User-facing docs live in `docs/` — `CONFIGURATION.md` (config + per-plugin env-var tables), `CREATING-PLUGINS.md`, and `CONTRIBUTION.md`") to use full relative paths for all three docs (e.g., `docs/CONFIGURATION.md`, `docs/CREATING-PLUGINS.md`, `docs/CONTRIBUTION.md`) so the line consistently references the docs directory like the links in README.md.src/plugins/builtin-firecrawl-search/main.ts (1)
46-50: Handlesuccess: falsedefensively (likely unnecessary today, but improves robustness)This code already throws on non-2xx HTTP responses (
if (!res.ok) throw ...), and Firecrawl’s API behavior indicatessuccess: falseis returned for error cases alongside non-2xx responses—so the currentNo resultsfallback for an HTTP 200 +success: falseresponse is probably unreachable. Still, adding an explicit guard before checkingdata?.data?.web?.lengthwould make the handler robust to any unexpected 200/error payloads.Optional defensive patch
const data = (await res.json()) as FirecrawlSearchResponse | null; + if (data?.success === false) { + throw new Error("Firecrawl search failed: upstream returned success=false"); + } + if (!data?.data?.web?.length) { return `No results for: ${query}`; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/plugins/builtin-firecrawl-search/main.ts` around lines 46 - 50, The current handler returns "No results" when data?.data?.web?.length is falsy but doesn't explicitly guard against a 200 response carrying success: false; update the parsing logic in src/plugins/builtin-firecrawl-search/main.ts (around the code that assigns data from res.json and the subsequent check) to first verify data?.success === true (or explicitly !== false) before inspecting data.data.web; if success is false, treat it as an error/No results path (e.g., return the same `No results for: ${query}` or throw) so that the check in this handler is defensive against 200 responses with success: false. Ensure you reference the same data variable and preserve existing behavior for non-2xx responses.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/CONFIGURATION.md`:
- Around line 86-92: Fix the spelling mistake in the `builtin-firecrawl-fetch`
table description: change "Firecrwawl" to "Firecrawl" in the description for
`SIBYL_FIRECRAWL_FETCH_USE_HTML` so the sentence reads "...otherwise returns the
markdown from Firecrawl with extra blank lines collapsed."
- Around line 20-22: Fix the grammar in the sentence that currently reads "one
of your custom written one!" by changing it to a correct plural or hyphenated
form; e.g., replace that fragment with "one of your custom-written plugins!" (or
"one of your custom written ones!") so it reads: "The value must match a
plugin's `name` (a builtin like `builtin-exa-search`, or one of your
custom-written plugins!)."
In `@docs/CREATING-PLUGINS.md`:
- Around line 81-96: Update the example plugin path string to fix the typo:
change the folder name `my-llm-ask-plugib` to `my-llm-ask-plugin` in the example
block so the ask function `askFn` and exported `SilbylPlugin` reference the
correct example path.
In `@README.md`:
- Around line 79-86: Remove the `ask` command row from the Commands table in
README.md so the table only documents the CLI commands actually implemented;
locate the table containing `search`, `fetch`, `ask`, `--help`, and `--version`
and delete the line starting with the `| `ask` ` row (the row that shows `sibyl
ask ...`) leaving the other rows unchanged.
In `@src/plugins/builtin-crawl4ai-fetch/main.ts`:
- Line 14: Rename the misspelled interface Craw4AiResult to Crawl4AiResult and
update all local references: change the type assertion that currently casts to
Craw4AiResult to cast to Crawl4AiResult, and update the error message text that
mentions "Crawl4AI" internals (currently referencing the old name) so it uses
the corrected spelling/format; keep existing variable names like crawl4AiUrl
unchanged. Ensure the identifier is updated consistently throughout main.ts
where that interface/type is used.
In `@src/plugins/builtin-firecrawl-fetch/main.ts`:
- Around line 25-32: The Firecrawl POST (the line creating "const res = await
fetch(...)" that sends { url, formats: [format] }) can hang; add an
AbortController and a timeout (e.g., 30s or configurable) and pass
controller.signal to fetch so the request is cancelled if it exceeds the
timeout; start a setTimeout that calls controller.abort(), clear the timeout
after fetch completes, and handle the abort/timeout by catching the error
(checking for AbortError) so the CLI fails fast instead of blocking.
In `@src/plugins/builtin-parse-htmlToMd/main.ts`:
- Around line 31-34: The ALLOWED_ATTRS entry for "a" currently permits
preserving href values without checking their scheme; update the
HTML-to-Markdown attribute-preservation logic (the code that reads ALLOWED_ATTRS
and copies anchor attributes) to validate and sanitize href values before
keeping them: allow only safe schemes (e.g., http, https, mailto) and
relative/fragment URLs, and strip or drop hrefs that start with unsafe schemes
such as javascript:, data:, vbscript:, file:, or other non-whitelisted
protocols; implement this as a small helper (e.g., sanitizeHref) invoked when
handling the "a" tag so the ALLOWED_ATTRS mechanism only retains safe hrefs.
In `@src/utils.ts`:
- Around line 31-33: The current parsing for SIBYL_SEARCH_RESULTS_LIMIT uses
Number.parseInt (variables raw/parsed), which accepts partial numerics like
"10abc" and "1.5"; replace that logic with a strict check (e.g., test raw
against /^\d+$/) and then use Number(raw) to get an integer, returning it only
if > 0, otherwise return 10; also add unit tests in the utils test file that
assert inputs "10abc" and "1.5" (and other edge cases like "0" or negative) fall
back to 10 to prevent partial numeric acceptance.
---
Outside diff comments:
In `@src/plugin-loader.test.ts`:
- Around line 188-215: The test no longer verifies that a custom parse plugin's
runtime function is invoked; update the test in src/plugin-loader.test.ts so
after obtaining customPlugin (the ParsePlugin created by parseFn and exported as
SilbylPlugin) you call its fn with a sample HTML string and a minimal context
object and assert the returned value equals the expected "parsed <html>" result,
while still asserting no console.warn/error calls; ensure you invoke
customPlugin.fn(html, context) (or await it if async) and check the output to
restore the parse-plugin invocation contract that loadPlugins() must satisfy.
---
Nitpick comments:
In `@CLAUDE.md`:
- Line 40: Update the reference on CLAUDE.md (currently line with "User-facing
docs live in `docs/` — `CONFIGURATION.md` (config + per-plugin env-var tables),
`CREATING-PLUGINS.md`, and `CONTRIBUTION.md`") to use full relative paths for
all three docs (e.g., `docs/CONFIGURATION.md`, `docs/CREATING-PLUGINS.md`,
`docs/CONTRIBUTION.md`) so the line consistently references the docs directory
like the links in README.md.
In `@src/plugins/builtin-firecrawl-search/main.ts`:
- Around line 46-50: The current handler returns "No results" when
data?.data?.web?.length is falsy but doesn't explicitly guard against a 200
response carrying success: false; update the parsing logic in
src/plugins/builtin-firecrawl-search/main.ts (around the code that assigns data
from res.json and the subsequent check) to first verify data?.success === true
(or explicitly !== false) before inspecting data.data.web; if success is false,
treat it as an error/No results path (e.g., return the same `No results for:
${query}` or throw) so that the check in this handler is defensive against 200
responses with success: false. Ensure you reference the same data variable and
preserve existing behavior for non-2xx responses.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: b7dec38e-0c82-48f1-8936-1063a6964128
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (37)
CLAUDE.mdREADME.mddocs/CONFIGURATION.mddocs/CONTRIBUTION.mddocs/CREATING-PLUGINS.mdpackage.jsonsrc/@types/plugin.tssrc/cli.test.tssrc/cli.tssrc/plugin-loader.test.tssrc/plugins/builtin-alterlab-fetch/main.test.tssrc/plugins/builtin-alterlab-fetch/main.tssrc/plugins/builtin-alterlab-search/main.test.tssrc/plugins/builtin-alterlab-search/main.tssrc/plugins/builtin-brightdata-fetch/main.test.tssrc/plugins/builtin-brightdata-fetch/main.tssrc/plugins/builtin-brightdata-search/main.test.tssrc/plugins/builtin-brightdata-search/main.tssrc/plugins/builtin-crawl4ai-fetch/main.test.tssrc/plugins/builtin-crawl4ai-fetch/main.tssrc/plugins/builtin-exa-fetch/main.test.tssrc/plugins/builtin-exa-search/main.test.tssrc/plugins/builtin-exa-search/main.tssrc/plugins/builtin-firecrawl-fetch/main.test.tssrc/plugins/builtin-firecrawl-fetch/main.tssrc/plugins/builtin-firecrawl-search/main.test.tssrc/plugins/builtin-firecrawl-search/main.tssrc/plugins/builtin-parse-htmlToMd/main.test.tssrc/plugins/builtin-parse-htmlToMd/main.tssrc/plugins/builtin-searxng-search/main.test.tssrc/plugins/builtin-searxng-search/main.tssrc/plugins/config.test.tssrc/plugins/config.tssrc/setup.test.tssrc/setup.tssrc/utils.test.tssrc/utils.ts
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/plugins/builtin-brightdata-fetch/main.ts (1)
9-9: ⚡ Quick winAdd explicit return type annotation for stricter type safety.
The
fetchFnfunction should have an explicitPromise<string>return type to align with strict TypeScript configuration and match the pattern in other fetch backends.📝 Suggested fix
-async function fetchFn(url: string, context: PluginContext) { +async function fetchFn(url: string, context: PluginContext): Promise<string> {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/plugins/builtin-brightdata-fetch/main.ts` at line 9, Add an explicit Promise<string> return type to the fetchFn signature (async function fetchFn(url: string, context: PluginContext): Promise<string>) to satisfy strict TypeScript typing; update the function signature only and ensure all return points still return strings (or await/throw as appropriate) so the implementation remains consistent with the new Promise<string> annotation.src/plugins/builtin-exa-fetch/main.ts (1)
19-19: Makebuiltin-exa-fetch’sfetchFnsignature consistent withFetchPlugin.fn
FetchPlugin.fnis typed as(url: string, context: PluginContext) => Promise<string>, and the CLI callsfetchPlugin.fn(url, context)—so passingcontextwill not break this plugin at runtime (extra args are ignored whenfetchFnonly acceptsurl). However, for consistency with the other fetch plugins (which do acceptcontext) and to make future context usage possible, updatefetchFnto includecontext: PluginContext(and importPluginContext).🔧 Suggested change
-async function fetchFn(url: string) { +async function fetchFn(url: string, context: PluginContext): Promise<string> {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/plugins/builtin-exa-fetch/main.ts` at line 19, Update the fetch handler to match the FetchPlugin.fn shape by changing the fetchFn signature to accept the second parameter, e.g. fetchFn(url: string, context: PluginContext): Promise<string>, and add an import for PluginContext; keep the existing implementation and return type intact so the function still returns a Promise<string> and will accept the CLI-provided context without runtime issues.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/plugins/builtin-brightdata-fetch/main.ts`:
- Line 9: Add an explicit Promise<string> return type to the fetchFn signature
(async function fetchFn(url: string, context: PluginContext): Promise<string>)
to satisfy strict TypeScript typing; update the function signature only and
ensure all return points still return strings (or await/throw as appropriate) so
the implementation remains consistent with the new Promise<string> annotation.
In `@src/plugins/builtin-exa-fetch/main.ts`:
- Line 19: Update the fetch handler to match the FetchPlugin.fn shape by
changing the fetchFn signature to accept the second parameter, e.g. fetchFn(url:
string, context: PluginContext): Promise<string>, and add an import for
PluginContext; keep the existing implementation and return type intact so the
function still returns a Promise<string> and will accept the CLI-provided
context without runtime issues.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 507bc23e-72f3-4d4f-95fd-ffa8c3112e2e
📒 Files selected for processing (26)
codecov.ymldocs/CONFIGURATION.mddocs/CREATING-PLUGINS.mdsrc/plugins/builtin-alterlab-fetch/main.test.tssrc/plugins/builtin-alterlab-fetch/main.tssrc/plugins/builtin-alterlab-search/main.test.tssrc/plugins/builtin-alterlab-search/main.tssrc/plugins/builtin-brightdata-fetch/main.test.tssrc/plugins/builtin-brightdata-fetch/main.tssrc/plugins/builtin-brightdata-search/main.test.tssrc/plugins/builtin-brightdata-search/main.tssrc/plugins/builtin-crawl4ai-fetch/main.test.tssrc/plugins/builtin-crawl4ai-fetch/main.tssrc/plugins/builtin-exa-fetch/main.test.tssrc/plugins/builtin-exa-fetch/main.tssrc/plugins/builtin-exa-search/main.test.tssrc/plugins/builtin-exa-search/main.tssrc/plugins/builtin-firecrawl-fetch/main.test.tssrc/plugins/builtin-firecrawl-fetch/main.tssrc/plugins/builtin-firecrawl-search/main.test.tssrc/plugins/builtin-firecrawl-search/main.tssrc/plugins/builtin-parse-htmlToMd/main.test.tssrc/plugins/builtin-searxng-search/main.test.tssrc/plugins/builtin-searxng-search/main.tssrc/utils.test.tssrc/utils.ts
✅ Files skipped from review due to trivial changes (2)
- codecov.yml
- docs/CREATING-PLUGINS.md
🚧 Files skipped from review as they are similar to previous changes (19)
- src/plugins/builtin-exa-fetch/main.test.ts
- src/plugins/builtin-exa-search/main.ts
- src/plugins/builtin-alterlab-search/main.ts
- src/plugins/builtin-firecrawl-fetch/main.ts
- src/utils.test.ts
- src/plugins/builtin-firecrawl-search/main.ts
- src/utils.ts
- src/plugins/builtin-searxng-search/main.test.ts
- src/plugins/builtin-brightdata-search/main.ts
- src/plugins/builtin-alterlab-fetch/main.ts
- src/plugins/builtin-brightdata-fetch/main.test.ts
- src/plugins/builtin-exa-search/main.test.ts
- src/plugins/builtin-alterlab-fetch/main.test.ts
- src/plugins/builtin-brightdata-search/main.test.ts
- src/plugins/builtin-alterlab-search/main.test.ts
- src/plugins/builtin-firecrawl-search/main.test.ts
- src/plugins/builtin-searxng-search/main.ts
- src/plugins/builtin-crawl4ai-fetch/main.ts
- src/plugins/builtin-parse-htmlToMd/main.test.ts
Summary by CodeRabbit
New Features
Documentation
Chores